1. Data

Information about the test:

question description
A1 linear function
A2 3D
A3 arithmetic rules
A4 Easy ineq.
A5 logs
A6 logs
A7 graph translation
A8 sine pi/3
A9 trig.ineq.
A10 trig.identity
A11 period
A12 rational exponents
A13 ellipsoid
A14 limit
A15 series
A16 diff.quotient
A17 graph f'
A18 product rule
A19 quotient rule
A20 (exp)'
A21 (ln(sin))'
A22 hyp.functions
A23 slope tangent
A24 IVT
A25 velocity
A26 int(poly)
A27 int(exp)
A28 Int = 0
A29 int even funct
A31 int(abs)
A32 FtoC algebra
A33 difference vectors
A35 intersect planes
A36 parallel planes
A30 FtoC graph
A34 normal vector

Load the student scores for the test - here we load the 2017 and 2018 ETH Zurich test data:

test_scores
## # A tibble: 3,433 x 38
##    year  class    A1    A2    A3    A4    A5    A6    A7    A8    A9   A10   A11
##    <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
##  1 2017  s21t~     1     0     1     1     1     0     1     0     1     0     1
##  2 2017  s21t~     1     0     1     1     1     1     0     1     1     2     1
##  3 2017  s21t~     1     0     0     0     1     1     1     0     1     1     1
##  4 2017  s21t~     1     0     1     1     1     1     1     1     1     1     1
##  5 2017  s21t~     1     0     1     0     2     0     1     0     1     0     2
##  6 2017  s21t~     0     1     0     0     1     2     0     2     2     2     1
##  7 2017  s21t~     1     0     1     0     2     1     0     1     2     1     0
##  8 2017  s21t~     1     1     1     1     2     1     1     2     2     2     2
##  9 2017  s21t~     1     1     0     1     1     1     1     1     1     1     1
## 10 2017  s21t~     1     0     1     0     0     1     0     0     1     1     1
## # ... with 3,423 more rows, and 25 more variables: A12 <dbl>, A13 <dbl>,
## #   A14 <dbl>, A15 <dbl>, A16 <dbl>, A17 <dbl>, A18 <dbl>, A19 <dbl>,
## #   A20 <dbl>, A21 <dbl>, A22 <dbl>, A23 <dbl>, A24 <dbl>, A25 <dbl>,
## #   A26 <dbl>, A27 <dbl>, A28 <dbl>, A29 <dbl>, A30 <dbl>, A31 <dbl>,
## #   A32 <dbl>, A33 <dbl>, A34 <dbl>, A35 <dbl>, A36 <dbl>

Missing data

The data includes scores of 2 for the “I don’t know” answer option. We replace these with 0, reflecting a non-correct answer:

test_scores <- test_scores %>% 
  mutate(across(starts_with("A"), ~ ifelse(. == 2, 0, .)))

Data summary

The number of responses from each class:

test_scores %>% 
  group_by(year, class) %>% 
  tally() %>% 
  gt() %>% 
  data_color(
    columns = c("n"),
    colors = scales::col_numeric(palette = c("Blues"), domain = NULL)
  )
## Warning: The `.dots` argument of `group_by()` is deprecated as of dplyr 1.0.0.
class n
2017
s21t-000-2017 1682
2018
s21t-000-2018 1751

Mean and standard deviation for each item:

test_scores %>% 
  select(-class) %>% 
  group_by(year) %>% 
  skim_without_charts() %>% 
  select(-contains("character."), -contains("numeric.p"), -skim_type) %>% 
  group_by(year) %>% 
  gt() %>% 
  fmt_number(columns = contains("numeric"), decimals = 3) %>%
  data_color(
    columns = c("numeric.mean"),
    colors = scales::col_numeric(palette = c("Greens"), domain = NULL)
  ) %>%
  cols_label(
    numeric.mean = "Mean",
    numeric.sd = "SD"
  )
skim_variable n_missing complete_rate Mean SD
2017
A1 0 1 0.949 0.219
A2 0 1 0.306 0.461
A3 0 1 0.623 0.485
A4 0 1 0.643 0.479
A5 0 1 0.467 0.499
A6 0 1 0.705 0.456
A7 0 1 0.706 0.456
A8 0 1 0.474 0.499
A9 0 1 0.463 0.499
A10 0 1 0.543 0.498
A11 0 1 0.575 0.495
A12 0 1 0.681 0.466
A13 0 1 0.375 0.484
A14 0 1 0.559 0.497
A15 0 1 0.461 0.499
A16 0 1 0.185 0.389
A17 0 1 0.588 0.492
A18 0 1 0.380 0.486
A19 0 1 0.353 0.478
A20 0 1 0.725 0.447
A21 0 1 0.509 0.500
A22 0 1 0.516 0.500
A23 0 1 0.412 0.492
A24 0 1 0.587 0.493
A25 0 1 0.540 0.499
A26 0 1 0.797 0.403
A27 0 1 0.390 0.488
A28 0 1 0.420 0.494
A29 0 1 0.484 0.500
A30 0 1 0.756 0.429
A31 0 1 0.376 0.484
A32 0 1 0.227 0.419
A33 0 1 0.778 0.416
A34 0 1 0.580 0.494
A35 0 1 0.324 0.468
A36 0 1 0.220 0.414
2018
A1 0 1 0.953 0.213
A2 0 1 0.282 0.450
A3 0 1 0.642 0.480
A4 0 1 0.624 0.485
A5 0 1 0.491 0.500
A6 0 1 0.734 0.442
A7 0 1 0.704 0.457
A8 0 1 0.496 0.500
A9 0 1 0.460 0.499
A10 0 1 0.541 0.498
A11 0 1 0.581 0.494
A12 0 1 0.712 0.453
A13 0 1 0.357 0.479
A14 0 1 0.573 0.495
A15 0 1 0.462 0.499
A16 0 1 0.200 0.400
A17 0 1 0.601 0.490
A18 0 1 0.363 0.481
A19 0 1 0.342 0.474
A20 0 1 0.723 0.448
A21 0 1 0.520 0.500
A22 0 1 0.516 0.500
A23 0 1 0.430 0.495
A24 0 1 0.576 0.494
A25 0 1 0.731 0.444
A26 0 1 0.805 0.397
A27 0 1 0.400 0.490
A28 0 1 0.455 0.498
A29 0 1 0.516 0.500
A30 0 1 0.729 0.444
A31 0 1 0.371 0.483
A32 0 1 0.203 0.403
A33 0 1 0.760 0.427
A34 0 1 0.615 0.487
A35 0 1 0.332 0.471
A36 0 1 0.228 0.420

2. Testing assumptions

Before applying IRT, we should check that the data satisfies the assumptions needed by the model. In particular, to use a 1-dimensional IRT model, we should have some evidence of unidimensionality in the test scores.

Inter-item correlations

If the test is unidimensional then we would expect student scores on pairs of items to be correlated.

This plot shows the correlations between scores on each pair of items:

item_scores <- test_scores %>% 
  select(-class, -year)

cor_ci <- psych::corCi(item_scores, plot = FALSE)

psych::cor.plot.upperLowerCi(cor_ci)

There are a few correlations that are not significantly different from 0:

cor_ci$ci %>% 
  as_tibble(rownames = "corr") %>% 
  filter(p > 0.05) %>% 
  arrange(-p) %>% 
  select(-contains(".e")) %>% 
  gt() %>% 
  fmt_number(columns = 2:4, decimals = 3)
corr lower upper p
A1-A2 −0.006 0.058 0.106

The overall picture is that the item scores are well correlated with each other.

Dimensionality

structure <- check_factorstructure(item_scores)
n <- n_factors(item_scores)

Is the data suitable for Factor Analysis?

  • KMO: The Kaiser, Meyer, Olkin (KMO) measure of sampling adequacy suggests that data seems appropriate for factor analysis (KMO = 0.95).
  • Sphericity: Bartlett’s test of sphericity suggests that there is sufficient significant correlation in the data for factor analysis (Chisq(630) = 30319.01, p < .001).

Method Agreement Procedure:

The choice of 6 dimensions is supported by 5 (21.74%) methods out of 23 (Optimal coordinates, Parallel analysis, Kaiser criterion, BIC, BIC).

plot(n)

summary(n) %>% gt()
n_Factors n_Methods
1 4
2 1
3 1
4 1
6 5
8 1
11 3
26 1
28 1
33 1
34 1
35 3
#n %>% tibble() %>% gt()
fa.parallel(item_scores, fa = "fa")

## Parallel analysis suggests that the number of factors =  9  and the number of components =  NA

1 Factor

We use the factanal function to fit a 1-factor model.

Note that this function cannot handle missing data, so any NA scores must be set to 0 for this analysis.

fitfact <- factanal(item_scores,
                    factors = 1,
                    rotation = "varimax")
print(fitfact, digits = 2, cutoff = 0.3, sort = TRUE)
## 
## Call:
## factanal(x = item_scores, factors = 1, rotation = "varimax")
## 
## Uniquenesses:
##   A1   A2   A3   A4   A5   A6   A7   A8   A9  A10  A11  A12  A13  A14  A15  A16 
## 0.95 0.95 0.75 0.79 0.72 0.74 0.79 0.79 0.61 0.65 0.67 0.88 0.91 0.73 0.87 0.81 
##  A17  A18  A19  A20  A21  A22  A23  A24  A25  A26  A27  A28  A29  A30  A31  A32 
## 0.78 0.78 0.78 0.63 0.59 0.61 0.64 0.73 0.94 0.69 0.63 0.64 0.85 0.81 0.78 0.78 
##  A33  A34  A35  A36 
## 0.88 0.86 0.79 0.84 
## 
## Loadings:
##  [1] 0.53 0.51 0.62 0.59 0.57 0.52 0.61 0.64 0.62 0.60 0.52 0.56 0.61 0.60     
## [16]      0.50 0.45 0.46 0.46 0.35 0.31 0.36 0.44 0.47 0.46 0.47      0.39 0.44
## [31] 0.46 0.47 0.34 0.38 0.46 0.40
## 
##                Factor1
## SS loadings       8.35
## Proportion Var    0.23
## 
## Test of the hypothesis that 1 factor is sufficient.
## The chi square statistic is 5447.17 on 594 degrees of freedom.
## The p-value is 0
load <- tidy(fitfact)

ggplot(load, aes(x = fl1, y = 0)) + 
  geom_point() + 
  geom_label_repel(aes(label = paste0("A", rownames(load))), show.legend = FALSE) +
  labs(x = "Factor 1", y = NULL,
       title = "Standardised Loadings", 
       subtitle = "Based upon correlation matrix") +
  theme_minimal()
## Warning: ggrepel: 14 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps

load %>% 
  select(question = variable, factor_loading = fl1) %>% 
  left_join(item_info, by = "question") %>% 
  arrange(-factor_loading) %>% 
  gt() %>%
  data_color(
    columns = contains("factor"),
    colors = scales::col_numeric(palette = c("Greens"), domain = NULL)
  )
question factor_loading description
A21 0.6407106 (ln(sin))'
A9 0.6228304 trig.ineq.
A22 0.6223752 hyp.functions
A27 0.6076260 int(exp)
A20 0.6061690 (exp)'
A23 0.5998775 slope tangent
A28 0.5961137 Int = 0
A10 0.5878608 trig.identity
A11 0.5709488 period
A26 0.5590386 int(poly)
A5 0.5324571 logs
A14 0.5232019 limit
A24 0.5215894 IVT
A6 0.5086087 logs
A3 0.4972984 arithmetic rules
A19 0.4718772 quotient rule
A17 0.4705175 graph f'
A32 0.4683131 FtoC algebra
A18 0.4646006 product rule
A31 0.4643091 int(abs)
A8 0.4602708 sine pi/3
A7 0.4593494 graph translation
A35 0.4579597 intersect planes
A4 0.4547066 Easy ineq.
A30 0.4376910 FtoC graph
A16 0.4367742 diff.quotient
A36 0.4014553 parallel planes
A29 0.3875447 int even funct
A34 0.3765420 normal vector
A15 0.3603866 series
A12 0.3513586 rational exponents
A33 0.3415341 difference vectors
A13 0.3075858 ellipsoid
A25 0.2364578 velocity
A2 0.2342048 3D
A1 0.2277957 linear function

TODO Meike to offer some comments here about whether there is any apparent meaning in the factor

6 Factor

Here we also investigate the 6-factor solution, to see whether these factors are interpretable.

fitfact6 <- factanal(item_scores, factors = 6, rotation = "varimax")
print(fitfact6, digits = 2, cutoff = 0.3, sort = TRUE)
## 
## Call:
## factanal(x = item_scores, factors = 6, rotation = "varimax")
## 
## Uniquenesses:
##   A1   A2   A3   A4   A5   A6   A7   A8   A9  A10  A11  A12  A13  A14  A15  A16 
## 0.91 0.81 0.74 0.75 0.68 0.74 0.72 0.76 0.53 0.64 0.64 0.87 0.80 0.71 0.82 0.72 
##  A17  A18  A19  A20  A21  A22  A23  A24  A25  A26  A27  A28  A29  A30  A31  A32 
## 0.69 0.41 0.00 0.47 0.53 0.59 0.60 0.63 0.91 0.58 0.60 0.61 0.83 0.71 0.76 0.75 
##  A33  A34  A35  A36 
## 0.84 0.70 0.56 0.70 
## 
## Loadings:
##     Factor1 Factor2 Factor3 Factor4 Factor5 Factor6
## A9   0.54    0.37                                  
## A23  0.50                                          
## A28  0.50                                          
## A24          0.52                                  
## A20                  0.59                          
## A18                          0.69                  
## A19                          0.95                  
## A35                                  0.58          
## A1                                                 
## A2                                           0.40  
## A3   0.31                                          
## A4           0.33                                  
## A5   0.38                                          
## A6                                                 
## A7           0.44                                  
## A8   0.39                                          
## A10  0.45                                          
## A11  0.43    0.36                                  
## A12                                                
## A13                                          0.34  
## A14  0.32                                          
## A15                                                
## A16  0.43                                          
## A17          0.48                                  
## A21  0.46            0.43                          
## A22  0.41            0.36                          
## A25                                                
## A26          0.34    0.48                          
## A27  0.45            0.32                          
## A29                                                
## A30          0.48                                  
## A31  0.33                                          
## A32  0.41                                          
## A33          0.33                                  
## A34                                  0.47          
## A36                                  0.43          
## 
##                Factor1 Factor2 Factor3 Factor4 Factor5 Factor6
## SS loadings        3.5    2.65    1.68    1.62    1.26    0.98
## Proportion Var     0.1    0.07    0.05    0.05    0.04    0.03
## Cumulative Var     0.1    0.17    0.22    0.26    0.30    0.32
## 
## Test of the hypothesis that 6 factors are sufficient.
## The chi square statistic is 932.36 on 429 degrees of freedom.
## The p-value is 2.36e-39
load6 <- tidy(fitfact6)

ggplot(load6, aes(x = fl1, y = fl2)) + 
  geom_point() + 
  geom_label_repel(aes(label = paste0("A", rownames(load))), show.legend = FALSE) +
  labs(x = "Factor 1", y = "Factor 2",
       title = "Standardised Loadings", 
       subtitle = "Based upon correlation matrix") +
  theme_minimal()

main_factors <- load6 %>% 
#  mutate(factorNone = 0.4) %>%  # add this to set the main factor to "None" where all loadings are below 0.4
  pivot_longer(names_to = "factor",
               cols = contains("fl")) %>% 
  mutate(value_abs = abs(value)) %>% 
  group_by(variable) %>% 
  top_n(1, value_abs) %>% 
  ungroup() %>% 
  transmute(main_factor = factor, variable)

library(kableExtra)
## 
## Attaching package: 'kableExtra'
## The following object is masked from 'package:dplyr':
## 
##     group_rows
load6 %>% 
  select(-uniqueness) %>% 
  # add the info about which is the main factor
  left_join(main_factors) %>%
  left_join(item_info %>% select(variable = question, description)) %>% 
  arrange(main_factor) %>% 
  select(main_factor, everything()) %>% 
  # arrange adjectives by descending loading on main factor
  rowwise() %>% 
  mutate(max_loading = max(abs(c_across(starts_with("fl"))))) %>% 
  group_by(main_factor) %>% 
  arrange(-max_loading, .by_group = TRUE) %>% 
  select(-max_loading) %>% 
  # sort out the presentation
  rename("Main Factor" = main_factor, # the _ throws a latex error
         "Question" = variable) %>%
  mutate_at(
    vars(starts_with("fl")),
    ~ cell_spec(round(., digits = 3), bold = if_else(abs(.) > 0.4, T, F))
  ) %>% 
  kable(booktabs = T, escape = F, longtable = T) %>% 
  kableExtra::collapse_rows(columns = 1, valign = "top") %>%
  kableExtra::kable_styling(latex_options = c("repeat_header"))
## Joining, by = "variable"
## Joining, by = "variable"
Main Factor Question fl1 fl2 fl3 fl4 fl5 fl6 description
fl1 A9 0.543 0.371 0.045 0.1 0.13 0.086 trig.ineq.
A23 0.504 0.289 0.189 0.085 0.136 0.023 slope tangent
A28 0.503 0.292 0.145 0.094 0.147 0.048 Int = 0
A21 0.456 0.194 0.429 0.13 0.143 0.07 (ln(sin))’
A27 0.453 0.193 0.316 0.086 0.216 0.075 int(exp)
A10 0.447 0.239 0.234 0.104 0.084 0.182 trig.identity
A16 0.427 -0.017 0.141 0.129 0.128 0.217 diff.quotient
A11 0.426 0.357 0.127 0.065 0.082 0.15 period
A22 0.408 0.237 0.36 0.108 0.21 0.073 hyp.functions
A32 0.407 0.122 0.106 0.104 0.19 0.115 FtoC algebra
A8 0.386 0.145 0.138 0.12 0.058 0.186 sine pi/3
A5 0.379 0.182 0.2 0.093 0.114 0.278 logs
A31 0.325 0.262 0.059 0.07 0.178 0.151 int(abs)
A14 0.324 0.176 0.266 0.118 0.18 0.18 limit
A3 0.313 0.212 0.247 0.085 0.08 0.213 arithmetic rules
A6 0.293 0.268 0.257 0.072 0.122 0.136 logs
fl2 A24 0.272 0.516 0.074 0.079 0.134 0.049 IVT
A30 0.171 0.484 0.086 0.051 0.094 0.08 FtoC graph
A17 0.193 0.478 0.118 0.073 0.061 0.137 graph f’
A7 0.167 0.443 0.157 0.036 0.095 0.161 graph translation
A4 0.225 0.334 0.118 0.064 0.084 0.249 Easy ineq.
A33 0.101 0.328 0.089 0.036 0.113 0.158 difference vectors
A1 0.046 0.27 0.089 0.027 0.029 0.06 linear function
A25 0.056 0.264 0.099 0.009 0.055 0.054 velocity
A12 0.185 0.191 0.143 0.083 0.072 0.168 rational exponents
fl3 A20 0.286 0.268 0.593 0.09 0.115 0.059 (exp)’
A26 0.221 0.338 0.476 0.11 0.142 0.002 int(poly)
A29 0.23 0.126 0.27 0.093 0.145 0.03 int even funct
fl4 A19 0.227 0.071 0.129 0.951 0.101 0.085 quotient rule
A18 0.211 0.138 0.135 0.692 0.114 0.106 product rule
fl5 A35 0.216 0.133 0.114 0.104 0.583 0.102 intersect planes
A34 0.109 0.185 0.169 0.057 0.47 0.006 normal vector
A36 0.223 0.106 0.06 0.045 0.43 0.212 parallel planes
fl6 A2 0.093 0.117 0.011 0.046 0.024 0.403 3D
A13 0.099 0.255 -0.006 0.044 0.106 0.339 ellipsoid
A15 0.154 0.233 0.084 0.023 0.143 0.28 series

TODO Meike to comment

3. Fitting 2 parameter logistic MIRT model

We can fit a Multidimensional Item Response Theory (mirt) model. From the function definition:

mirt fits a maximum likelihood (or maximum a posteriori) factor analysis model to any mixture of dichotomous and polytomous data under the item response theory paradigm using either Cai's (2010) Metropolis-Hastings Robbins-Monro (MHRM) algorithm.

The process is to first fit the model, and save the result as a model object that we can then parse to get tabular or visual displays of the model that we might want. When fitting the model, we have the option to specify a few arguments, which then get interpreted as parameters to be passed to the model.

fit_2pl <- mirt(
  data = item_scores, # just the columns with question scores
  model = 1,          # number of factors to extract
  itemtype = "2PL",   # 2 parameter logistic model
  SE = TRUE           # estimate standard errors
  )
## 
Iteration: 1, Log-Lik: -66668.278, Max-Change: 0.71582
Iteration: 2, Log-Lik: -65483.178, Max-Change: 0.26188
Iteration: 3, Log-Lik: -65316.598, Max-Change: 0.14767
Iteration: 4, Log-Lik: -65249.858, Max-Change: 0.11072
Iteration: 5, Log-Lik: -65212.345, Max-Change: 0.09658
Iteration: 6, Log-Lik: -65190.906, Max-Change: 0.06270
Iteration: 7, Log-Lik: -65178.102, Max-Change: 0.06095
Iteration: 8, Log-Lik: -65169.976, Max-Change: 0.03668
Iteration: 9, Log-Lik: -65165.338, Max-Change: 0.03883
Iteration: 10, Log-Lik: -65162.425, Max-Change: 0.02443
Iteration: 11, Log-Lik: -65160.764, Max-Change: 0.01661
Iteration: 12, Log-Lik: -65159.750, Max-Change: 0.01247
Iteration: 13, Log-Lik: -65158.047, Max-Change: 0.00237
Iteration: 14, Log-Lik: -65158.024, Max-Change: 0.00195
Iteration: 15, Log-Lik: -65158.012, Max-Change: 0.00168
Iteration: 16, Log-Lik: -65157.995, Max-Change: 0.00082
Iteration: 17, Log-Lik: -65157.990, Max-Change: 0.00089
Iteration: 18, Log-Lik: -65157.986, Max-Change: 0.00072
Iteration: 19, Log-Lik: -65157.976, Max-Change: 0.00018
Iteration: 20, Log-Lik: -65157.975, Max-Change: 0.00020
Iteration: 21, Log-Lik: -65157.975, Max-Change: 0.00019
Iteration: 22, Log-Lik: -65157.974, Max-Change: 0.00012
Iteration: 23, Log-Lik: -65157.973, Max-Change: 0.00012
Iteration: 24, Log-Lik: -65157.973, Max-Change: 0.00011
Iteration: 25, Log-Lik: -65157.973, Max-Change: 0.00009
## 
## Calculating information matrix...

Local independence

We compute Yen’s \(Q_3\) (1984) to check for any dependence between items after controlling for \(\theta\). This gives a score for each pair of items, with scores above 0.2 regarded as problematic (see DeMars, p. 48).

residuals_2pl  %>% as.matrix() %>% 
  corrplot::corrplot(type = "upper")

This shows that most item pairs are independent, with only one pair showing cause for concern:

residuals_2pl %>%
  rownames_to_column(var = "q1") %>%
  as_tibble() %>% 
  pivot_longer(cols = starts_with("A"), names_to = "q2", values_to = "Q3_score") %>% 
  filter(abs(Q3_score) > 0.2) %>% 
  filter(parse_number(q1) < parse_number(q2)) %>%
  gt()
q1 q2 Q3_score
A18 A19 0.6740812
A34 A35 0.2109123

Items A18 and A19 are on the product and quotient rules.

Given that this violation of the local independence assumption is very mild, we proceed using this model.

Model parameters

We then compute factor score estimates and augment the existing data frame with these estimates, to keep everything in one place. To do the estimation, we use the fscores() function from the mirt package which takes in a computed model object and computes factor score estimates according to the method specified. We will use the EAP method for factor score estimation, which is the “expected a-posteriori” method, the default. We specify it explicitly below, but the results would have been the same if we omitted specifying the method argument since it’s the default method the function uses.

test_scores <- test_scores %>%
  mutate(F1 = fscores(fit_2pl, method = "EAP"))

We can also calculate the model coefficient estimates using a generic function coef() which is used to extract model coefficients from objects returned by modeling functions. We will set the IRTpars argument to TRUE, which means slope intercept parameters will be converted into traditional IRT parameters.

coefs_2pl <- coef(fit_2pl, IRTpars = TRUE)

The resulting object coefs is a list, with one element for each question, and an additional GroupPars element that we won’t be using. The output is a bit long, so we’re only showing a few of the elements here:

coefs_2pl[1:3]
## $A1
##                a         b  g  u
## par     1.219104 -2.946041  0  1
## CI_2.5  1.024917 -3.298279 NA NA
## CI_97.5 1.413291 -2.593803 NA NA
## 
## $A2
##                 a        b  g  u
## par     0.5612745 1.670103  0  1
## CI_2.5  0.4752132 1.405353 NA NA
## CI_97.5 0.6473358 1.934852 NA NA
## 
## $A3
##                a          b  g  u
## par     1.354723 -0.5385656  0  1
## CI_2.5  1.236060 -0.6126183 NA NA
## CI_97.5 1.473386 -0.4645129 NA NA
# coefs_2pl[35:37]

Let’s take a closer look at the first element:

coefs_2pl[1]
## $A1
##                a         b  g  u
## par     1.219104 -2.946041  0  1
## CI_2.5  1.024917 -3.298279 NA NA
## CI_97.5 1.413291 -2.593803 NA NA

In this output:

  • a is discrimination
  • b is difficulty
  • endpoints of the 95% confidence intervals are also shown

To make this output a little more user friendly, we should tidy it such that we have a row per question. We’ll do this in two steps. First, write a function that tidies the output for one question, i.e. one list element. Then, map this function over the list of all questions, resulting in a data frame.

tidy_mirt_coefs <- function(x){
  x %>%
    # melt the list element
    melt() %>%
    # convert to a tibble
    as_tibble() %>%
    # convert factors to characters
    mutate(across(where(is.factor), as.character)) %>%
    # only focus on rows where X2 is a or b (discrimination or difficulty)
    filter(X2 %in% c("a", "b")) %>%
    # in X1, relabel par (parameter) as est (estimate)
    mutate(X1 = if_else(X1 == "par", "est", X1)) %>%
    # unite columns X2 and X1 into a new column called var separated by _
    unite(X2, X1, col = "var", sep = "_") %>%
    # turn into a wider data frame
    pivot_wider(names_from = var, values_from = value)
}

Let’s see what this does to a single element in coefs:

tidy_mirt_coefs(coefs_2pl[1])
## # A tibble: 1 x 7
##   L1    a_est a_CI_2.5 a_CI_97.5 b_est b_CI_2.5 b_CI_97.5
##   <chr> <dbl>    <dbl>     <dbl> <dbl>    <dbl>     <dbl>
## 1 A1     1.22     1.02      1.41 -2.95    -3.30     -2.59

And now let’s map it over all 32 elements of coefs:

tidy_2pl <- map_dfr(coefs_2pl[1:32], tidy_mirt_coefs, .id = "Question")

A quick peek at the result:

tidy_2pl
## # A tibble: 32 x 7
##    Question a_est a_CI_2.5 a_CI_97.5   b_est b_CI_2.5 b_CI_97.5
##    <chr>    <dbl>    <dbl>     <dbl>   <dbl>    <dbl>     <dbl>
##  1 A1       1.22     1.02      1.41  -2.95   -3.30      -2.59  
##  2 A2       0.561    0.475     0.647  1.67    1.41       1.93  
##  3 A3       1.35     1.24      1.47  -0.539  -0.613     -0.465 
##  4 A4       1.18     1.07      1.29  -0.587  -0.670     -0.505 
##  5 A5       1.47     1.34      1.59   0.0797  0.0159     0.143 
##  6 A6       1.56     1.42      1.70  -0.859  -0.938     -0.781 
##  7 A7       1.29     1.17      1.41  -0.885  -0.975     -0.795 
##  8 A8       1.16     1.06      1.27   0.0659 -0.00726    0.139 
##  9 A9       2.02     1.86      2.18   0.128   0.0731     0.183 
## 10 A10      1.77     1.63      1.91  -0.145  -0.203     -0.0867
## # ... with 22 more rows

And a nicely formatted table of the result:

gt(tidy_2pl) %>%
  fmt_number(columns = contains("_"), decimals = 3) %>%
  data_color(
    columns = contains("a_"),
    colors = scales::col_numeric(palette = c("Greens"), domain = NULL)
  ) %>%
  data_color(
    columns = contains("b_"),
    colors = scales::col_numeric(palette = c("Blues"), domain = NULL)
  ) %>%
  tab_spanner(label = "Discrimination", columns = contains("a_")) %>%
  tab_spanner(label = "Difficulty", columns = contains("b_")) %>%
  cols_label(
    a_est = "Est.",
    b_est = "Est.",
    a_CI_2.5 = "2.5%",
    b_CI_2.5 = "2.5%",
    a_CI_97.5 = "97.5%",
    b_CI_97.5 = "97.5%"
  )
Question Discrimination Difficulty
Est. 2.5% 97.5% Est. 2.5% 97.5%
A1 1.219 1.025 1.413 −2.946 −3.298 −2.594
A2 0.561 0.475 0.647 1.670 1.405 1.935
A3 1.355 1.236 1.473 −0.539 −0.613 −0.465
A4 1.180 1.072 1.289 −0.587 −0.670 −0.505
A5 1.466 1.344 1.589 0.080 0.016 0.143
A6 1.558 1.420 1.695 −0.859 −0.938 −0.781
A7 1.287 1.168 1.406 −0.885 −0.975 −0.795
A8 1.162 1.057 1.268 0.066 −0.007 0.139
A9 2.016 1.856 2.176 0.128 0.073 0.183
A10 1.768 1.625 1.910 −0.145 −0.203 −0.087
A11 1.684 1.546 1.821 −0.278 −0.339 −0.217
A12 0.889 0.792 0.987 −1.088 −1.219 −0.957
A13 0.718 0.631 0.806 0.855 0.717 0.992
A14 1.435 1.314 1.556 −0.256 −0.322 −0.190
A15 0.850 0.759 0.940 0.209 0.116 0.302
A16 1.539 1.390 1.687 1.295 1.194 1.397
A17 1.227 1.117 1.337 −0.404 −0.479 −0.329
A18 1.237 1.125 1.350 0.550 0.471 0.629
A19 1.309 1.192 1.427 0.639 0.561 0.718
A20 2.326 2.126 2.525 −0.738 −0.799 −0.677
A21 2.137 1.967 2.306 −0.047 −0.100 0.006
A22 1.999 1.840 2.158 −0.053 −0.108 0.002
A23 1.875 1.724 2.026 0.268 0.211 0.326
A24 1.439 1.317 1.561 −0.314 −0.381 −0.247
A25 0.534 0.453 0.615 −1.128 −1.334 −0.922
A26 2.424 2.199 2.648 −1.035 −1.104 −0.966
A27 1.994 1.833 2.155 0.350 0.293 0.408
A28 1.850 1.702 1.999 0.212 0.154 0.269
A29 0.913 0.820 1.006 −0.000 −0.087 0.086
A30 1.263 1.142 1.384 −1.084 −1.186 −0.983
A31 1.250 1.138 1.363 0.539 0.461 0.617
A32 1.604 1.454 1.754 1.153 1.062 1.244
tidy_2pl %>% 
  mutate(qnum = parse_number(Question)) %>% 
  ggplot(aes(
    x = a_est,
    y = b_est
  )) +
  geom_errorbar(aes(ymin = b_CI_2.5, ymax = b_CI_97.5), width = 0, alpha = 0.5) +
  geom_errorbar(aes(xmin = a_CI_2.5, xmax = a_CI_97.5), width = 0, alpha = 0.5) +
  geom_text_repel(aes(label = Question), alpha = 0.5) +
  geom_point() +
  theme_minimal() +
  labs(x = "Discrimination",
       y = "Difficulty")

tidy_2pl %>% 
  write_csv("data-eth/OUT_2pl-results-pre-only.csv")

Comparing years and classes

Do students from different programmes of study have different distributions of ability?

Differences between years

Compare the distribution of abilities in the year groups (though in this case there is only one).

ggplot(test_scores, aes(F1, fill = as.factor(year), colour = as.factor(year))) +
  geom_density(alpha=0.5) + 
  scale_x_continuous(limits = c(-3.5,3.5)) +
  labs(title = "Density plot", 
       subtitle = "Ability grouped by year of taking the test", 
       x = "Ability", y = "Density",
       fill = "Year", colour = "Year") +
  theme_minimal()

Differences between classes

Compare the distribution of abilities in the various classes.

ggplot(test_scores, aes(x = F1, y = class, colour = class, fill = class)) +
  geom_density_ridges(alpha = 0.5) + 
  scale_x_continuous(limits = c(-3.5,3.5)) +
  guides(fill = FALSE, colour = FALSE) +
  labs(title = "Density plot", 
       subtitle = "Ability grouped by class of taking the test", 
       x = "Ability", y = "Class") +
  theme_minimal()
## Picking joint bandwidth of 0.194

Information curves

Test information curve

plot(fit_2pl, type = "infoSE", main = "Test information")

Item information curves

plot(fit_2pl, type = "infotrace", main = "Item information curves")

Response curves

Test response curves

plot(fit_2pl, type = "score", auto.key = FALSE)

Item response curves

We can get individual item surface and information plots using the itemplot() function from the mirt package, e.g.

mirt::itemplot(fit_2pl, item = 1, 
               main = "Trace lines for item 1")

We can also get the plots for all trace lines, one facet per plot.

plot(fit_2pl, type = "trace", auto.key = FALSE)

Or all of them overlaid in one plot.

plot(fit_2pl, type = "trace", facet_items=FALSE)

An alternative approach is using ggplot2 and plotly to add interactivity to make it easier to identify items.

# store the object
plt <- plot(fit_2pl, type = "trace", facet_items = FALSE)
# the data we need is in panel.args
# TODO - I had to change the [[1]] to [[2]] since the plt has two panels for some reason, with the one we want being the 2nd panel
plt_data <- tibble(
  x          = plt$panel.args[[2]]$x,
  y          = plt$panel.args[[2]]$y,
  subscripts = plt$panel.args[[2]]$subscripts,
  item       = rep(colnames(item_scores), each = 200)
) %>%
  mutate(
    item_no = str_remove(item, "A") %>% as.numeric(),
    item    = fct_reorder(item, item_no)
    )

head(plt_data)
## # A tibble: 6 x 5
##       x      y subscripts item  item_no
##   <dbl>  <dbl>      <int> <fct>   <dbl>
## 1 -6    0.0236        201 A1          1
## 2 -5.94 0.0253        202 A1          1
## 3 -5.88 0.0272        203 A1          1
## 4 -5.82 0.0292        204 A1          1
## 5 -5.76 0.0314        205 A1          1
## 6 -5.70 0.0337        206 A1          1
plt_gg <- ggplot(plt_data, aes(x, y, 
                          colour = item, 
                          text = item)) + 
  geom_line() + 
  labs(
    title = "2PL - Trace lines",
    #x = expression(theta),
    x = "theta",
    #y = expression(P(theta)),
    y = "P(theta)",
    colour = "Item"
  ) +
  theme_minimal() +
  theme(legend.position = "none")

ggplotly(plt_gg, tooltip = "text")
knitr::knit_exit()